A-posteriori Provenance-enabled Linking of Publications and Datasets via Crowdsourcing

نویسندگان

  • Laura Dragan
  • Markus Luczak-Rösch
  • Elena Paslaru Bontas Simperl
  • Heather S. Packer
  • Luc Moreau
  • Bettina Berendt
چکیده

This paper aims to share with the digital library community different opportunities to leverage crowdsourcing for a-posteriori capturing of dataset citation graphs. We describe a practical approach, which exploits one possible crowdsourcing technique to collect these graphs from domain experts and proposes their publication as Linked Data using the W3C PROV standard. Based on our findings from a study we ran during the USEWOD 2014 workshop, we propose a semi-automatic approach that generates metadata by leveraging information extraction as an additional step to crowdsourcing, to generate high-quality data citation graphs. Furthermore, we consider the design implications on our crowdsourcing approach when non-expert participants are involved in the process.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Crowdsourcing data citation graphs using provenance

In this paper we describe a tool designed to support crowdsourcing a-posteori provenance information about the datasets used in research publications. It generates PROV data both to capture the data citation graphs—via an extension to the PROV Data Model, and the crowdsourcing process—via prov:bundles.

متن کامل

Perform Three Data Mining Tasks with Crowdsourcing Process

For data mining studies, because of the complexity of doing feature selection process in tasks by hand, we need to send some of labeling to the workers with crowdsourcing activities. The process of outsourcing data mining tasks to users is often handled by software systems without enough knowledge of the age or geography of the users' residence. Uncertainty about the performance of virtual user...

متن کامل

DEMO: A Lightweight Provenance Pingback and Query Service for Web Publications

Web resources, such as publications, datasets, pictures and others can be directly linked to their provenance data, as described in the specification about Provenance Access and Query (PROV-AQ) by the W3C. On its own, this approach places all responsibility with the publisher of the resource, who hopefully maintains and publishes provenance information. In reality, however, most publishers lack...

متن کامل

Crowdsourcing Protein Family Database Curation

We propose a novel method for crowdsourcing a protein family database. We discuss how we intend to identify novel groupings of proteins from user sequence similarity search, and how text mining will be applied to assist in annotation of these novel groupings, and more broadly as an enrichment of protein sequence similarity search results. We intend to use entity linking to identify literature w...

متن کامل

Estimating the Parameters for Linking Unstandardized References with the Matrix Comparator

This paper discusses recent research on methods for estimating configuration parameters for the Matrix Comparator used for linking unstandardized or heterogeneously standardized references. The matrix comparator computes the aggregate similarity between the tokens (words) in a pair of references. The two most critical parameters for the matrix comparator for obtaining the best linking results a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • D-Lib Magazine

دوره 21  شماره 

صفحات  -

تاریخ انتشار 2015